Cooperativity and Contact Order in Protein Folding 



Marek Cieplak 

Institute of Physics, Polish Academy of Sciences, Al. Lotnikow 32/4-6, 02-668 Warsaw, Poland 



Abstract 

The effects of cooperativity are studied within Go-Lennard-Jones models of 
proteins by making the contact interactions dependent on the proximity to 
the native conformation. The kinetic universality classes are found to remain 
the same as in the absence of cooperativity. For a fixed native geometry, 
small changes in the effective contact map may affect the folding times in a 
chance way and to the extent that is comparable to the shift in the folding 
times due to cooperativity. The contact order controlls folding scenarios: the 
average times necessary to bring pairs of amino acids into their near native 
separations depend on the sequential distances within the pairs. This depen- 
dence is largely monotonic, regardless of the cooperativity, and the dominant 
trend could be described by a single parameter like the average contact order. 
However, it is the deviations from the trend which are usually found to set 
the net folding times. 



PACS Numbers: 87.15.He, 87.15.Cc, 87.15.Aa 
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There are many indications that properties of a protein, such as the folding time, tf a id, 
depend on the topology of its native state [1-3]. Furthermore, the number of possible 
native folds is known to be limited [4] . These two facts provide support for geometry based 
modelling of proteins such as the tube-like [5] and Go-like approaches [6]. There are two 
geometrical parameters that have been hypothesized as setting a scale for tf Q i d : N - the 
number of amino acids (or the backbone length of a protein) and CO - the relative contact 
order parameter [2]. The latter is defined as an average sequential distance between pairs 
of amino acids that interact, or make a contact, and normalized by N. A compilation of 
tf id that were measured at room temperature [2,3] suggested a correlation with CO but no 
dependence on N. On the other hand, theoretical studies yielded tf old growing with N [7,8] 
and not depending on CO [8]. Is it the theories or the interpretation of the experimental 
data that are wrong? 

Jewett, Pande and Plaxco [9] have recently suggested that the theories do not properly 
account for cooperativity effects, i.e. for the fact that the strength of effective amino acid 
interactions should depend on a conformation. One of the reasons for such a dependence is 
that changes in the conformation may lead to variations in the degree of exposure to water 
molecules. Specifically, Jewett et al. have considered a 27-mer lattice Go model in which 
the contact energy, e', is related to the native contact energy, e, through 

£ ' = Pe ' P= (l-s)Q + s ' (1) 

where Q is the fraction of established native bonds and s is a control parameter which 
introduces a conformation dependence for s > 1. The result of their simulations is that tfou 
correlates with CO at the 57% correlation level for s=3 as compared to 5% for s—1 and to 
80% reported in the experimental data. A related study of a similar model by Kaya and 
Chan [10] indicates an even higher degree of correlation induced by this kind of cooperativity. 

What would this prescription for cooperativity yield in off-lattice models of actual pro- 
teins? In this Paper, we consider 14 a-type (no native /9-sheets) and 16 /3-type (no native 
helices) short proteins which were studied in Ref. [8]. The model is coarse-grained, i.e. it 
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involves only the C a atoms, it is Go-like, i.e. the potential is chosen so that the ground 
state agrees with the experimentally determined native structure and it also favors the na- 
tive sense of chirality. We describe the contact interactions by the Lennard- Jones potential 
which is scaled by the energy parameter e' as in eq. (1). This parameter is identical for 
all native contacts. We take s=3 and adopt an adiabatic way to smooth out any sudden 
changes in Q during the time evolution. 

We find that the scaling of tf \ d with N and the kind of the dependence on CO do not 
change on switching from s — 1 to s — 3, but the folding times become, on average, about 
2.25 longer since the couplings provide a weaker pull at the begining of the folding process. In 
particular, we confirm existence of the structure related kinetic universality classes [8] . Even 
though the single CO parameter itself does not correlate with tf ia, the contact order, in a 
more general sense, is important for the folding scenarios. One can characterize the folding 
scenarios by plotting the average first times, t c , needed to establish specific native contacts 
as a function of the corresponding sequence distance. We find that the folding scenarios are 
governed by the distance itself in a fairly monotonic way. However, the control is incomplete 
and often it is the out-of-trend deviations that set the time to form the last contact, i.e that 
set tfoid- Thus the local structures, like helices, do tend to form first (in agreement with 
numerous experimental findings) but the way the non-local structures form is not necessarily 
in agreement with the contact order. Furthermore, we find that the effects of cooperativity 
may be less important than those of the precise determination of the contacts considered 
native in the Go model. Removing some contacts from the native set or declaring some 
reasonable non-native contacts to be effectively native may affect tf u more significantly 
than the tinkering with the strength of the couplings. These kinds of adjustments in the 
contact map physically correspond to considering sets of sequences which are different and 
yet folding to nearly the same conformations, i.e. belonging to the same native fold. 

An important feature of our studies is that for each model protein we determine the 
dependence of tf u on the temperature, T, and take tf Q u at the optimal temperature, T min , 
as the characteristic duration of folding that is relevant for scaling studies. When one 
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considers measurement or calculation at a fixed value of T (such as the room temperature) 
then this T can be in any distance away from the minimum of the typically U-shaped 
dependence of tf Q id on T. Thus such, essentially arbitrary, choices of T may significantly 
affect comparatory analysis of proteins. The arbitrariness is removed if the kinetics are 
observed at T min . The experimental studies do not involve optimization of this kind. 

When constructing the model, we follow references [11]. Briefly, the amino acids are 
represented by particles of mass m located at the positions of the C a atoms. They are 
tethered by a strong harmonic potential with a minimum at the peptide bond length. The 
native structure of a protein is taken from the Protein Data Bank [12] and the interactions 
between the amino acids are divided into native and non-native. The distinction is based 
on the atomic representation of the amino acids in the native state. We check for overlaps 
between the atoms by associating spherical volume to them. The assigned radii are 1.24 
times the van der Walls values and the multiplication factor accounts for the softness of the 
potential [13]. The overlapping amino acids (i and j) are considered to be making contacts. 
The resulting C a — C a contact separation ranges between 4.3 and 12.8 A. These pairs are 
endowed with the Lennard- Jones potential in which the length parameter <7jj is chosen pair 
by pair so that the native C a - C a distance agrees with the minimum of the potential. The 
non-native contacts are purely repulsive (in the basic model) and truncated at the distance 
of 2 1 / 6 cr where a = 5A During the time evolution, a native contact is considered to be 
established when the distance between the amino acids involved is less than 1.5 a^. The 
thermal fluctuations away from the native state are accounted for through the Langevin 
noise with the damping constant 7 of 2 m/r, where r is \Jma 2 /e. This leads to negligible 
inertial effects [8] but a more realistic account of the water environment requires 7 to be at 
least an order of magnitude larger. However, at T min has been found to be linear in 7 
[11] so the physical time scales can be accessed by a simple rescaling. 

Figure 1 illustrates the way the model with the cooperativity effect is defined by showing 
the time evolution of p. The adiabatic way to incorporate variations in p is as follows. 
The integration of the equations of motion is based on the discretization of r into 200 
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segments. With each time advancement by t^jT, Q that enters eq.l becomes updated 
through Q new = 0.99Q oM + 0MQ current to eliminate rapid jumps in Q curre nt- {Qcurrent, the 
instantaneous value, is meant as Q in Figure 1). The resulting p is | in the fully unfolded 
state as then Q is zero. In the native state, Q — p — 1. It is seen that at about 1/3 through 
of the folding evolution, the variations in p start mirroring those in Q. The inset of Figure 

1 shows that the initial reduction in p, compared to the native value, results in a longer 
tf id- However, the results for tf Q id at s=3 correlate strongly with those at s—1. It should 
be noted that the cooperativity effect is expected to enhance the thermodynamic stability 
as it makes non-native local energy minima less stable relative to the native native state. 
On the other hand, we have found that the values of T min become lower (in the record cases 
by 0.12). These two effects combined suggest that the energy landscape gets sculpted in a 
way that enhances the folding funnel. 

Figure 2 shows that cooperativity does not affect the scaling curves. The largest value 
of N considered here is 154 and the smallest - 35. In this range, it is hard to distinguish 
between the power law and the exponential dependencies, even though the correlation levels 
for the a proteins favor the former slightly. However, there continues to be a support for 
existence of kinetic universality classes that depend on the type of the secondary structure. 
When the power law fits are used, the exponent for the a-proteins is about 1.7 and for the 
/3-proteins - about 3.2 (in the mixed case it is about 2.5). The scaling trends seen in Figure 

2 become disturbed, but still identifiable, when calculations are performed not at T min but 
at a fixed T. 

Cooperativity does not affect the dependence on CO either, as is shown in Figure 3. It is 
seen that a given value of CO may correspond to a big span of the values of tf Q id and there is 
no trend that can be demonstrated. Significant variations in tf Q id can be obtained by staying 
with one native geometry and adjusting the list of contacts that are considered native in the 
dynamics. We focused on three proteins: lrpo, lcsp, and lefn with the calculated numbers 
of the contacts of 194 (N=61), 169 (N=67), and 150 (N=57), respectively. We identified 
all non-native contacts with the spatial C a -C a range of less than 12 A and considered 
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systems with 5, 10, 20, 30, and 40 such contacts, chosen randomly, as providing additional 
active contacts. In addition, we considered systems in which 7 long range native contacts 
were removed from the native list. We studied variations of tf id with CO within the three 
families of systems, each consisting of 7 members. The family of lrpo shows a growing trend 
with CO. This trend gets disturbed in the case of lefn. On the other hand, the variations 
around lcsp are chaotic. It should be noted that the variations within the families are not 
significant in the plots on the iV-dependence, even in the case of lrpo. It should also be 
pointed out that our data contain two pairs of proteins with identical values of N (2abd and 
limq in the a case and ltit and lten in the (3 case) and in these pairs tf Q id is in fact longer 
for the protein with the bigger CO. 

In the Author's opinion, the experimental evidence, at the fixed T, for the trends in CO 
is not definitive. The inset of Figure 4 presents these data [3] separately for the a- and (3- 
proteins (the data in the original paper are not plotted split into the three structural classes: 
a, P, and a — (3). If one focuses just on the (3 proteins then it emerges that four very different 
folding times correspond to almost the same CO. A similar point is demonstrated in the 
inset of Figure 5 which presents the /9-protein entries in the data compiled by Galzitskaya 
et al. [14]. The /^-proteins form the crucial test case of the approach since they involve long 
range contacts. In the case of a-proteins, CO is more a measure of the helical content, h, 
in the protein than a measure of the sequence range which is short. The lower right panel 
of Figure 6 shows that h, if non-zero, is in fact anticorrelated with CO to a fair degree (the 
correlation coefficient of 0.74). 

The folding scenarios, however, do depend on the contact order. Not on its average value 
but on the full set of values. This is illustrated in Figures 4 and 5 for the lrpo (a-type) and 
lcsp (/3-type) proteins. The figures show t c as a function of the sequence length \j — i\. In 
the case of the helical lrpo, the dependence is monotonic and as such it could be represented 
by a single parameter, e.g. the relative (or average) contact order. Note that there is no 
qualitative difference between the s=3 and s = 1 cases and the same goes to lcsp (s = 1 
not shown in Figure 5). Thus cooperativity does not introduce any new features in the 
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folding scenarios other than general shifts. Note that when 30 additional contacts that are 
consistent with the native topology are introduced in lrpo (s=3) then the shifts in the data 
points are of the size that is comparable to the very introduction of the cooperativity. Thus 
cooperativity acts as if it was affecting the number of effective contacts. 

Figure 5 shows that the /9-protein lcsp has a structured form of the plot of t c vs. \j — i\. 
This kind of branched form is found in many other proteins both of the /3-type, like in ltit, 
and of the a-type, like in lf63, lycc, or 256b. In lcsp, there are many contacts of the same 
— which are established at different times. More importantly, the last to form are not the 
longest ranged contacts corresponding to = (1,62) and (1,64) but the medium ranged 
(9,41), (8,43) and then (6,44), (6,45). Adding the 20 contacts to lcsp shifts the pattern 
downward but does not affect it in any fundamental manner. Note that the formation time 
of the shortest ranged contacts is not affected by the additional contacts. The acceleration 
of folding begins in the second branch of contacts around \ j — i\ of 11. 

We have checked that the folding scenarios are not sensitive to the details of the Go- 
modeling, such as the choice of the contact potential (the 10-12 potential instead of the 
Lennard- Jones) or to the addition of terms that depend on angular parameters, such as the 
dihedral angles. Furthermore, we have considered other variants of the cooperativity effects. 
One of them is to replace Q by the ratio of the full potential energy of the system to its 
native value. The results are qualitatively similar to those reported here but the range of 
variations of the p parameter during the simulations is reduced, making it less effective, 
compared to the contact based definition. 

We conclude that incorporation of elements of cooperativity in the couplings does not 
affect the folding process in the off-lattice Go models in any qualitative manner other than 
making the folding times longer despite the better sculpted folding funnel. Notice that the 
parameter Q does not generally couple to the contact order. Thus it is puzzling why its 
incorporation into the cooperativity effects appears to enhance the CO-dependence in the 
N=27 lattice models. [9,10] Perhaps the lattice geometry itself imposes some sort of coupling 
that emerges when considering systems of a fixed value of N. 
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Since the off-lattice Go models, with or without the cooperativity, do not yield a correla- 
tion of the folding times with CO it is interesting to ask whether some new quality arises if 
CO is replaced by the helical content parameter in the case of a and a — [3 proteins. Figure 
6 shows that this is not so. The experimental data points compiled by Galzitskaya et al. [14] 
on two-state proteins containing helices (the lower left panel) do anti-correlate with h (the 
correlation coefficient is 0.68), though not as strongly as they correlate with CO (not shown; 
the correlation coefficient is 0.86). On the other hand, our model calculations (the top two 
panels) remain uncorrelated both with h and with CO even though some tendency to grow 
with h might be identified in the a — j3 case. We should reemphasize that the experimental 
data do not pertain to the characteristic folding time that could be uniquely associated with 
a protein. The characteristic time must be obtained through an optimization process, i.e. it 
must be measured at T min . A room temperature based measurement need not coincide with 
the conditions at T min and can yield almost any value of tf Q id, depending on the the precise 
choice of the control parameters such as T and pH. 

We have considered a well controlled model of proteins and demonstrated lack of depen- 
dence of the folding times on the relative contact order parameter, irrespective of whether 
the cooperativity is taken into account or not. We have also demonstrated that, for a fixed 
geometry, or a fixed native fold, one can get very different folding times depending on the 
sequence, i.e. depending on what precisely constitutes the set of active contacts. These 
findings do not mean that geometry of the native state is irrelevant. Rather, they mean 
that the single relative contact order parameter may be inappropriate to characterize it. It 
is the full native contact map that has a kinetic meaning. 

MC appreciates fruitful discussions with T. X. Hoang and his help. The correspondence 
with K. W. Plaxco and especially his gift of reference [9] before publication are also appre- 
ciated. A. Sienkiewicz helped polish the manuscript. This work was funded by KBN (grant 
2 P03B 032 25). 



8 



REFERENCES 

[1] D. Shortle and M. S. Ackerman, Science 2001;293:487-489; R. Unger and J. Moult, J. 
Mol. Biol. 259 988-994 (1996). 

[2] K. W. Plaxco, K. T. Simons, and D. Baker, J. Mol. Biol. 277 985-994 (1998). 

[3] K. W. Plaxco, K. T. Simons, I. Ruczinski, D. Baker, Biochemistry 39 11177-11183 
(2000). 

[4] C. Chothia, Nature 357, 543-544 (1992). 

[5] A. Maritan, C. Micheletti, A. Trovato, and J. R. Banavar, Nature 406 287 (2000); J. 
R. Banavar and A. Maritan, Rev. Mod. Phys. 75 23-34 (2003). 

[6] H. Abe, N. Go, Biopolymers 20, 1013-1031 (1981); S. Takada, Proc. Natl. Acad. Sci. 
USA 96, 11698-11700 (1999). 

[7] D. Thirumalai, J. Physique I 5 1457-1467 (1995); A. M. Gutin,V. I. Abkevich, and 
E. I. Shakhnovich, Phys. Rev. Lett. 77 5433-5436 (1996); N. Koga and S. Takada, J. 
Mol. Biol. 313 171-180 (2001); V. P. Zhdanov, Europhys Lett. 42 577-581 (1998); M. 
Cieplak, T. X. Hoang, and M. S. Li, Phys. Rev. Lett. 83 1684-1687 (1999). 

[8] M. Cieplak and T. X. Hoang, Biophysical J. 84 475-488 (2003). 

[9] A. I. Jewett, V. S. Pande, and K. W. Plaxco, J. Mol. Biol. 326, 247-253 (2003). 

[10] H. Kaya and H. S. Chan, Proteins (in press) and cond-mat 0304231. 

[11] T. X. Hoang and M. Cieplak, J. Chem. Phys. 112, 6851-6862 (2000); T. X. Hoang and 
M. Cieplak, J. Chem. Phys. 113, 8319-8328 (2001); M. Cieplak and T. X. Hoang, Int. 
J. Mod. Phys. C 13, 1231-1242 (2002). 

[12] F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. Meyer Jr., M. D. Brice, J. 
R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, J. Mol. Biol. 112, 535-542 
(1977). 

9 



[13] J. Tsai, R. Taylor, C. Chothia, and M. Gerstein, J. Mol. Biol. 290, 253-266 (1999); 
G. Settanni, T. X. Hoang, C. Micheletti, and A. Maritan, Biophys. J. 83, 3533-3541 
(2002). 

[14] O. V. Galzitskaya, S. O. Garbuzynskiy, D. N. Ivankov, and A. V. Finkelstein, Proteins 
51, 162-166 (2003). 



FIGURE CAPTIONS 

Fig. 1. The temporal behavior of Q (the thinner line) and p (the thicker line) in an example 
of a folding trajectory in the Go-like model of the lcsp protein with s=3. The inset 
shows the median folding times determined with the cooperativity factor (s=3) plotted 
vs. the folding times without any cooperativity effects (s=l). The hexagons are for 
the a-proteins and the stars for the /9-proteins. The straight line corresponds to a 
'conversion factor' of 2.25 (±0.2). 

Fig. 2. The iV-dependence of the median folding times, defined as the 'first passage times', 
obtained based on at least 101 trajectories. The top panels are for the a-proteins and 
the bottom ones for the /9-proteins. In the left-hand panels, the N scale is logarithmic 
and the corresponding power law exponents are indicated. In, the right-hand panels, 
the N scale is linear and the corresponding correlation lengths are indicated. The 
PDB codes of the a-proteins studied are lce4, lbba, 2pdd, lbw6, lrpo, lhp8, lail, 
2abd, limq, limb, lycc, lhrc, 256b, lf63. The codes of the /3-proteins are: lcbh, lixa, 
led7, lbq9, lefn, 2cdx, lcsp, 2ait, lbdo, ltit, lten, lwit, lwho, 6pcy, lksr, 4fgf. The 
correlation levels of the power law (exponential) fits are 0.978 and 0.956 (0.960 and 
0.971) for the a and (5 proteins respectively. 

Fig. 3. The dependence of tf a u on the relative contact order for the a- (triangles) and j3- 
proteins (circles). The dotted line separates the values of CO that were found for the 
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a- from those found for the /3-proteins. The filled symbols correspond to three selected 
proteins lrpo (circles), lcsp (squares), and lefn (stars) in which extra contacts were 
added or some contacts were subtracted. The families of such systems are connected 
by lines. The results corresponding to the true native contact maps are shown by 
the larger symbols. Within each family, tj u at an individually determined optimal 
temperature is displayed. If a fixed temperature is used instead (the one which is 
optimal for the true native contact map) the plots would look similar in character. 
The folding times of 7019 r for lf63 (CO=0.1291), 5024 r for 6pcy (CO=0.2448), and 
19600 r for 4fgf (CO=0.1873) are beyond the vertical scale of this figure. 

Fig. 4. The folding scenarios for the lrpo protein as described by the average time to 
form a contact corresponding to the sequence length \j — i\. The squares correspond 
to the standard Lennard- Jones Go-like model whereas the stars are for the systems 
with the couplings modified by the cooperativity effect. The small dots indicate data 
obtained when 30 non-native contacts are added randomly (with the condition that 
the contacts formed are shorter than 12 A in the physical space) and the cooperativity 
factor p corresponds to s=3. The data are based on 200 trajectories at T min . The inset 
shows compilation of the experimental results, based on the data from [3]. The relative 
contact order COp is calculated somewhat differently than CO in that it involves non- 
hydrogen atoms in a distance less than a cutoff value of 6 A as discussed further in ref. 
[8]. CO involves only the C a atoms but existence of a contact is based on the atomic 
overlap. 

Fig. 5. The folding scenario for the lcsp protein with the cooperativity effect included. The 
small dots indicate data obtained when 20 non-native contacts are added randomly. 
The inset shows the two-state protein data compiled by Galzitskaya et al. [14] and 
plotted vs. the relative contact order parameter, COq, as calculated by them - usually 
it coincides with COp. 
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Fig. 6 The dependence on the helical content parameter h defined as the ratio of the 
number of amino acids that belong to a-helices to the total number of amino acids 
in a protein. The top two panels shows results of the model calculations for s—1 as 
obtained in Ref. [8] for the a — (3 and a proteins respectively. The proteins considered 
are those listed in this reference. The bottom panel on the right shows the dependence 
of CO on h for the same proteins. The bottom panel on the left plots the experimental 
data points [14] for a different set of proteins (there is a substantial overlap with the 
set considered in the simulations). 
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